skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Hossjer, Ola"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Many data sources, including tracking social behav- ior to election polling to testing studies for understanding disease spread, are subject to sampling bias whose implications are not fully yet understood. In this paper we study estimation of a given feature (such as disease, or behavior at social media platforms) from biased samples, treating non-respondent individuals as missing data. Prevalence of the feature among sampled individuals has an upward bias under the assumption of individuals’ willingness to be sampled. This can be viewed as a regression model with symptoms as covariates and the feature as outcome. It is assumed that the outcome is unknown at the time of sampling, and therefore the missingness mechanism only depends on the covariates. We show that data, in spite of this, is missing at random only when the sizes of symptom classes in the population are known; otherwise data is missing not at random. With an information theoretic viewpoint, we show that sampling bias corresponds to external information due to individuals in the population knowing their covariates, and we quantify this external information by active information. The reduction in prevalence, when sampling bias is adjusted for, similarly translates into active information due to bias correction, with opposite sign to active information due to testing bias. We develop unified results that show that prevalence and active information estimates are asymptotically normal under all missing data mechanisms, when testing errors are absent and present respectively. The asymptotic behavior of the estimators is illustrated through simulations. 
    more » « less
  2. Sampling for prevalence estimation of infection is subject to bias by both over- sampling of symptomatic individuals and error-prone tests. This results in naïve estimators of prevalence (ie, proportion of observed infected individuals in the sample) that can be very far from the true proportion of infected. In this work, we present a method of prevalence estimation that reduces both the effect of bias due to testing errors and oversampling of symptomatic individuals, eliminat- ing it altogether in some scenarios. Moreover, this procedure considers stratified errors in which tests have different error rate profiles for symptomatic and asymptomatic individuals. This results in easily implementable algorithms, for which code is provided, that produce better prevalence estimates than other methods (in terms of reducing and/or removing bias), as demonstrated by formal results, simulations, and on COVID-19 data from the Israeli Ministry of Health. 
    more » « less